Solving Semi-Markov Decision Problems using Average Reward Reinforcement Learning

نویسندگان

  • Tapas K. Das
  • Abhijit Gosavi
  • Sridhar Mahadevan
  • Nicholas Marchalleck
چکیده

A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow intractably with the size of the problem and its related data. Furthermore, these techniques require for each action the one step transition probability and reward matrices, obtaining which is often unrealistic for large and complex systems. Recently, there has been much interest in a simulation-based stochastic approximation framework called reinforcement learning (RL), for computing near optimal policies for MDPs. RL has been successfully applied to very large problems, such as elevator scheduling, and dynamic channel allocation of cellular telephone systems. In this paper, we extend RL to a more general class of decision tasks that are referred to as semi-Markov decision problems (SMDPs). In particular, we focus on SMDPs under the average-reward criterion. We present a new model-free RL algorithm called SMART (Semi-Markov Average Reward Technique). We present a detailed study of this algorithm on a combinatorially large problem of determining the optimal preventive maintenance schedule of a production inventory system. Numerical results from both the theoretical model and the RL algorithm are presented and compared.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

This paper describes a novel method to solve average-reward semi-Markov decision processes, by reducing them to a minimal sequence of cumulative reward problems. The usual solution methods for this type of problems update the gain (optimal average reward) immediately after observing the result of taking an action. The alternative introduced, optimal nudging, relies instead on setting the gain t...

متن کامل

Approximate Policy Iteration for Semi-Markov Control Revisited

The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze a...

متن کامل

Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning

Many factory optimization problems, from inventory control to scheduling and reliability , can be formulated as continuous-time Markov decision processes. A primary goal in such problems is to nd a gain-optimal policy that minimizes the long-run average cost. This paper describes a new average-reward algorithm called SMART for nd-ing gain-optimal policies in continuous time semi-Markov decision...

متن کامل

Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management

The adaptive critic heuristic has been a popular algorithm in reinforcement learning (RL) and approximate dynamic programming (ADP) alike. It is one of the first RL and ADP algorithms. RL and ADP algorithms are particularly useful for solving Markov decision processes (MDPs) that suffer from the curses of dimensionality and modeling. Many real-world problems however tend to be semi-Markov decis...

متن کامل

Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games

A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999